
<!DOCTYPE html
  PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   
      <title>18.6.&nbsp;优化字符串操作</title>
      <link rel="stylesheet" href="../diveintopython.css" type="text/css">
      <link rev="made" href="mailto:f8dy@diveintopython.org">
      <meta name="generator" content="DocBook XSL Stylesheets V1.52.2">
      <meta name="keywords" content="Python, Dive Into Python, tutorial, object-oriented, programming, documentation, book, free">
      <meta name="description" content="Python from novice to pro">
      <link rel="home" href="../toc/index.html" title="Dive Into Python">
      <link rel="up" href="index.html" title="第&nbsp;18&nbsp;章&nbsp;性能优化">
      <link rel="previous" href="list_operations.html" title="18.5.&nbsp;优化列表操作">
      <link rel="next" href="summary.html" title="18.7.&nbsp;小结">
   </head>
   <body>
      <table id="Header" width="100%" border="0" cellpadding="0" cellspacing="0" summary="">
         <tr>
            <td id="breadcrumb" colspan="5" align="left" valign="top">导航：<a href="../index.html">起始页</a>&nbsp;&gt;&nbsp;<a href="../toc/index.html">Dive Into Python</a>&nbsp;&gt;&nbsp;<a href="index.html">性能优化</a>&nbsp;&gt;&nbsp;<span class="thispage">优化字符串操作</span></td>
            <td id="navigation" align="right" valign="top">&nbsp;&nbsp;&nbsp;<a href="list_operations.html" title="上一页: “优化列表操作”">&lt;&lt;</a>&nbsp;&nbsp;&nbsp;<a href="summary.html" title="下一页: “小结”">&gt;&gt;</a></td>
         </tr>
         <tr>
            <td colspan="3" id="logocontainer">
               <h1 id="logo"><a href="../index.html" accesskey="1">深入 Python :Dive Into Python 中文版</a></h1>
               <p id="tagline">Python 从新手到专家 [Dip_5.4b_CPyUG_Release]</p>
            </td>
            <td colspan="3" align="right">
               <form id="search" method="GET" action="http://www.google.com/custom">
                  <p><label for="q" accesskey="4">Find:&nbsp;</label><input type="text" id="q" name="q" size="20" maxlength="255" value=""> <input type="submit" value="搜索"><input type="hidden" name="domains" value="woodpecker.org.cn/diveintopython"><input type="hidden" name="sitesearch" value="www.woodpecker.org.cn/diveintopython"></p>
               </form>
            </td>
         </tr>
      </table>
      <!--#include virtual="/inc/ads" -->
      <div class="section" lang="zh_cn">
         <div class="titlepage">
            <div>
               <div>
                  <h2 class="title"><a name="soundex.stage4"></a>18.6.&nbsp;优化字符串操作
                  </h2>
               </div>
            </div>
            <div></div>
         </div>
         <div class="abstract">
            <p>Soundex 算法的最后一步是对短结果补零和截短长结果。最佳的做法是什么？</p>
         </div>
         <p>这是目前在 <tt class="filename">soundex/stage2/soundex2c.py</tt> 中的做法：
         </p>
         <div class="informalexample"><pre class="programlisting">
    digits3 = re.sub(<span class='pystring'>'9'</span>, <span class='pystring'>''</span>, digits2)
    <span class='pykeyword'>while</span> len(digits3) &lt; 4:
        digits3 += <span class='pystring'>"0"</span>
    <span class='pykeyword'>return</span> digits3[:4]
</pre></div>
         <p>这里是 <tt class="filename">soundex2c.py</tt> 的表现：
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage2&gt;</tt><span class="userinput">python soundex2c.py</span>
<span class="computeroutput">Woo             W000 12.6070768771
Pilgrim         P426 14.4033353401
Flingjingwaller F452 19.7774882003</span>
</pre></div>
         <p>思考的第一件事是以循环取代正则表达式。这里的代码来自 <tt class="filename">soundex/stage4/soundex4a.py</tt>：
         </p>
         <div class="informalexample"><pre class="programlisting">
    digits3 = <span class='pystring'>''</span>
    <span class='pykeyword'>for</span> d <span class='pykeyword'>in</span> digits2:
        <span class='pykeyword'>if</span> d != <span class='pystring'>'9'</span>:
            digits3 += d
</pre></div>
         <p><tt class="filename">soundex4a.py</tt> 快了吗？是的：
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage4&gt;</tt><span class="userinput">python soundex4a.py</span>
<span class="computeroutput">Woo             W000 6.62865531792
Pilgrim         P426 9.02247576158
Flingjingwaller F452 13.6328416042</span>
</pre></div>
         <p>但是，等一下。一个从字符串去除字符的循环？我们可以用一个简单的字符串方法做到。这便是  <tt class="filename">soundex/stage4/soundex4b.py</tt>：
         </p>
         <div class="informalexample"><pre class="programlisting">
    digits3 = digits2.replace(<span class='pystring'>'9'</span>, <span class='pystring'>''</span>)
</pre></div>
         <p><tt class="filename">soundex4b.py</tt> 快了吗？这是个有趣的问题，它取决输入值：
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage4&gt;</tt><span class="userinput">python soundex4b.py</span>
<span class="computeroutput">Woo             W000 6.75477414029
Pilgrim         P426 7.56652144337
Flingjingwaller F452 10.8727729362</span>
</pre></div>
         <p><tt class="filename">soundex4b.py</tt> 中的字符串方法对于大多数名字比循环快，但是对于短小的情况 (很短的名字) 却比 <tt class="filename">soundex4a.py</tt> 略微慢些。性能优化并不总是一致的，对于一个情况快些，却可能对另外一些情况慢些。就此而言，大多数情况将会从改变中获益，所以就改吧，但是别忘了原则。
         </p>
         <p>最后仍很重要的是，让我们检测算法的最后两步：以零补齐短结果和截短超过四字符的长结果。你在  <tt class="filename">soundex4b.py</tt> 中看到的代码就是做这个工作的，但是太没效率了。看一下 <tt class="filename">soundex/stage4/soundex4c.py</tt> 找出原因：
         </p>
         <div class="informalexample"><pre class="programlisting">
    digits3 += <span class='pystring'>'000'</span>
    <span class='pykeyword'>return</span> digits3[:4]
</pre></div>
         <p>我们为什么需要一个 <tt class="literal">while</tt> 循环来补齐结果？我们早就知道我们需要把结果截成四字符，并且我们知道我们已经有了至少一个字符 (直接从 <tt class="varname">source</tt> 中拿过来的起始字符)。这意味着我们可以仅仅在输出的结尾添加三个零，然后截断它。不要害怕重新理解问题，从不太一样的角度看问题可以获得简单的解决方案。
         </p>
         <p>我们丢弃 <tt class="literal">while</tt> 循环后从 <tt class="filename">soundex4c.py</tt> 中获得怎样的速度？太明显了：
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage4&gt;</tt><span class="userinput">python soundex4c.py</span>
<span class="computeroutput">Woo             W000 4.89129791636
Pilgrim         P426 7.30642134685
Flingjingwaller F452 10.689832367</span>
</pre></div>
         <p>最后，还有一件事可以令这三行运行得更快：你可以把它们合并为一行。看一眼 <tt class="filename">soundex/stage4/soundex4d.py</tt>：
         </p>
         <div class="informalexample"><pre class="programlisting">
    <span class='pykeyword'>return</span> (digits2.replace(<span class='pystring'>'9'</span>, <span class='pystring'>''</span>) + <span class='pystring'>'000'</span>)[:4]
</pre></div>
         <p>在 <tt class="filename">soundex4d.py</tt> 中把所有代码放在一行可以比 <tt class="filename">soundex4c.py</tt> 稍微快那么一点：
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage4&gt;</tt><span class="userinput">python soundex4d.py</span>
<span class="computeroutput">Woo             W000 4.93624105857
Pilgrim         P426 7.19747593619
Flingjingwaller F452 10.5490700634</span>
</pre></div>
         <p>它非常难懂，而且优化也不明显。这值得吗？我希望你有很好的见解。性能并不是一切。你在优化方面的努力应该与程序的可读性和可维护性相平衡。</p>
      </div>
      <table class="Footer" width="100%" border="0" cellpadding="0" cellspacing="0" summary="">
         <tr>
            <td width="35%" align="left"><br><a class="NavigationArrow" href="list_operations.html">&lt;&lt;&nbsp;优化列表操作</a></td>
            <td width="30%" align="center"><br>&nbsp;<span class="divider">|</span>&nbsp;<a href="index.html#soundex.divein" title="18.1.&nbsp;概览">1</a> <span class="divider">|</span> <a href="timeit.html" title="18.2.&nbsp;使用 timeit 模块">2</a> <span class="divider">|</span> <a href="regular_expressions.html" title="18.3.&nbsp;优化正则表达式">3</a> <span class="divider">|</span> <a href="dictionary_lookups.html" title="18.4.&nbsp;优化字典查找">4</a> <span class="divider">|</span> <a href="list_operations.html" title="18.5.&nbsp;优化列表操作">5</a> <span class="divider">|</span> <span class="thispage">6</span> <span class="divider">|</span> <a href="summary.html" title="18.7.&nbsp;小结">7</a>&nbsp;<span class="divider">|</span>&nbsp;
            </td>
            <td width="35%" align="right"><br><a class="NavigationArrow" href="summary.html">小结&nbsp;&gt;&gt;</a></td>
         </tr>
         <tr>
            <td colspan="3"><br></td>
         </tr>
      </table>
      <div class="Footer">
         <p class="copyright">Copyright © 2000, 2001, 2002, 2003, 2004 <a href="mailto:mark@diveintopython.org">Mark Pilgrim</a></p>
         <p class="copyright">Copyright © 2001, 2002, 2003, 2004, 2005, 2006, 2007 <a href="mailto:python-cn@googlegroups.com">CPyUG (邮件列表)</a></p>
      </div>
   </body>
</html>